NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

DSPO: Direct Score Preference Optimization for Diffusion Model Alignment.

Zhu, Huaisheng; Xiao, Teng; Honavar, Vasant G (July 2025, International Conference on Learning Representations (ICLR 2025))

Diffusion-based Text-to-Image (T2I) models have achieved impressive success in generating high-quality images from textual prompts. While large language models (LLMs) effectively leverage Direct Preference Optimization (DPO) for fine-tuning on human preference data without the need for reward models, diffusion models have not been extensively explored in this area. Current preference learning methods applied to T2I diffusion models immediately adapt existing techniques from LLMs. However, this direct adaptation introduces an estimated loss specific to T2I diffusion models. This estimation can potentially lead to suboptimal performance through our empirical results. In this work, we propose Direct Score Preference Optimization (DSPO), a novel algorithm that aligns the pretraining and fine-tuning objectives of diffusion models by leveraging score matching, the same objective used during pretraining. It introduces a new perspective on preference learning for diffusion models. Specifically, DSPO distills the score function of human-preferred image distributions into pretrained diffusion models, fine-tuning the model to generate outputs that align with human preferences. We theoretically show that DSPO shares the same optimization direction as reinforcement learning algorithms in diffusion models under certain conditions. Our experimental results demonstrate that DSPO outperforms preference learning baselines for T2I diffusion models in human preference evaluation tasks and enhances both visual appeal and prompt alignment of generated images.
more » « less
Free, publicly-accessible full text available July 28, 2026
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters},

Xiao, Teng; Yuan, Yige; Chen, Zhengyu; Li, Mingxiao; Liang, Shangsong; Ren, Zhaochun; Honavar, Vasant G (July 2025, Proceedings of the International Conference on Learning Representations (ICLR 2025))

Existing preference optimization objectives for language model alignment require additional hyperparameters that must be extensively tuned to achieve optimal performance, increasing both the complexity and time required for fine-tuning large language models. In this paper, we propose a simple yet effective hyperparameter-free preference optimization algorithm for alignment. We observe that promising performance can be achieved simply by optimizing inverse perplexity, which is calculated as the inverse of the exponentiated average log-likelihood of the chosen and rejected responses in the preference dataset. The resulting simple learning objective, SimPER, is easy to implement and eliminates the need for expensive hyperparameter tuning and a reference model, making it both computationally and memory efficient. Extensive experiments on widely used real-world benchmarks, including MT-Bench, AlpacaEval 2, and 10 key benchmarks of the Open LLM Leaderboard with 5 base models, demonstrate that SimPER consistently and significantly outperforms existing approaches—even without any hyperparameters or a reference model. For example, despite its simplicity, SimPER outperforms state-of-the-art methods by up to 5.7 points on AlpacaEval 2 and achieves the highest average ranking across 10 benchmarks on the Open LLM Leaderboard. The source code for SimPER is publicly available at: https://github.com/tengxiao1/SimPER.
more » « less
Free, publicly-accessible full text available July 28, 2026
On a Connection Between Imitation Learning and RLHF

Xiao, Teng; Yuan, Yige; Li, Mingxiao; Chen, Zhengyu; Honavar, Vasant G (April 2025, International Conference on Representation Learning 2025 (ICLR 2025))

This work studies the alignment of large language models with preference data from an imitation learning perspective. We establish a close theoretical connection between reinforcement learning from human feedback (RLHF) and imitation learning (IL), revealing that RLHF implicitly performs imitation learning on the preference data distribution. Building on this connection, we propose DIL, a principled framework that directly optimizes the imitation learning objective. DIL provides a unified imitation learning perspective on alignment, encompassing existing alignment algorithms as special cases while naturally introducing new variants. By bridging IL and RLHF, DIL offers new insights into alignment with RLHF. Extensive experiments demonstrate that DIL outperforms existing methods on various challenging benchmarks. The code for DIL is available at https://github.com/tengxiao1/DIL.
more » « less
Free, publicly-accessible full text available April 28, 2026
Cal-DPO: Calibrated Direct Preference Optimization for Language Model Alignment

Xiao, Teng; Yuan, Yige; Zhu, Huaisheng; Li, Mingxiao; Honavar, Vasant G (December 2024, 38th Conference on Neural Information Processing Systems (NeurIPS 2024).)

We study the problem of aligning large language models (LLMs) with human preference data. Contrastive preference optimization has shown promising results in aligning LLMs with available preference data by optimizing the implicit reward associated with the policy. However, the contrastive objective focuses mainly on the relative values of implicit rewards associated with two responses while ignoring their actual values, resulting in suboptimal alignment with human preferences. To address this limitation, we propose calibrated direct preference optimization (Cal-DPO), a simple yet effective algorithm. We show that substantial improvement in alignment with the given preferences can be achieved simply by calibrating the implicit reward to ensure that the learned implicit rewards are comparable in scale to the ground-truth rewards. We demonstrate the theoretical advantages of Cal-DPO over existing approaches. The results of our experiments on a variety of standard benchmarks show that Cal-DPO remarkably improves off-the-shelf methods.
more » « less
Full Text Available
GeomCLIP: Contrastive Geometry-Text Pre-training for Molecules

https://doi.org/10.1109/BIBM62325.2024.10822346

Xiao, Teng; Cui, Chao; Zhu, Huaisheng; Honavar, Vasant G (December 2024, IEEE)

Pretraining molecular representations is crucial for drug and material discovery. Recent methods focus on learning representations from geometric structures, effectively capturing 3D position information. Yet, they overlook the rich information in biomedical texts, which detail molecules’ properties and substructures. With this in mind, we set up a data collection effort for 200K pairs of ground-state geometric structures and biomedical texts, resulting in a PubChem3D dataset. Based on this dataset, we propose the GeomCLIP framework to enhance geometric pretraining and understanding by biomedical texts. During pre-training, we design two types of tasks, i.e., multimodal representation alignment and unimodal denoising pretraining, to align the 3D geometric encoder with textual information and, at the same time, preserve its original representation power. Experimental results show the effectiveness of GeomCLIP in various tasks such as molecule property prediction, zero-shot text-molecule retrieval, and 3D molecule captioning. Our code and collected dataset are available at https://github.com/xiaocui3737/GeomCLIP.
more » « less
Full Text Available
Efficient Contrastive Learning for Fast and Accurate Inference on Graphs

Xiao, Teng; Zhu, Huaisheng; Zhang, Zhiwei; Guo, Zhimeng; Aggarwal, Charu C; Wang, Suhang; Honavar, Vasant G (July 2024, Proceedings of Machine Learning Research: International Conference on Machine Learning)

Graph contrastive learning has made remarkable advances in settings where there is a scarcity of task-specific labels. Despite these advances, the significant computational overhead for representation inference incurred by existing methods that rely on intensive message passing makes them unsuitable for latency-constrained applications. In this paper, we present GraphECL, a simple and efficient contrastive learning method for fast inference on graphs. GraphECL does away with the need for expensive message passing during inference. Specifically, it introduces a novel coupling of the MLP and GNN models, where the former learns to computationally efficiently mimic the computations performed by the latter. We provide a theoretical analysis showing why MLP can capture essential structural information in neighbors well enough to match the performance of GNN in downstream tasks. The extensive experiments on widely used real-world benchmarks that show that GraphECL achieves superior performance and inference efficiency compared to state-of-the-art graph constrastive learning (GCL) methods on homophilous and heterophilous graphs. Code is available at: https: //github.com/tengxiao1/GraphECL.
more » « less
Full Text Available
Towards Fair Graph Neural Networks via Graph Counterfactual

https://doi.org/10.1145/3583780.3615092

Guo, Zhimeng; Li, Jialiang; Xiao, Teng; Ma, Yao; Wang, Suhang (October 2023, In Proceedings of 32nd ACM International Conference on Information and Knowledge Management (CIKM 2023))

Full Text Available
Reconsidering Learning Objectives in Unbiased Recommendation: A Distribution Shift Perspective

https://doi.org/10.1145/3580305.3599487

Xiao, Teng; Chen, Zhengyu; Wang, Suhang (August 2023, In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2023))

Full Text Available
Representation Matters When Learning From Biased Feedback in Recommendation

https://doi.org/10.1145/3511808.3557431

Xiao, Teng; Chen Zhengyu; Wang, Suhang (October 2022, In Proceedings of the 31st ACM International Conference on Information and Knowledge Management (CIKM '22))

Full Text Available
Decoupled Self-supervised Learning for Graphs

Xiao, Teng; Chen, Zhengyu; Guo, Zhimeng; Zhuang, Zeyang; Wang, Suhang (December 2022, In Proceedings of Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022))

Full Text Available

« Prev Next »

Search for: All records